Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing

نویسنده

  • Hung-Ming Sun
چکیده

The Constrained Run-Length Algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is very efficient for partitioning documents with Manhattan layouts but not suited to deal with complex layout pages, e.g. irregular graphics embedded in a text paragraph. Its main drawback is to use only local information during the smearing stage, which may lead to erroneous linkage of text and graphics. This paper presents a solution to this problem by adding global information into the process of the CRLA. This enhanced CRLA can be applied to non-Manhattan page layout successfully. It can also extract text surrounded by a box. Both cases cannot be processed by the original CRLA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selective CRLA based Layout Analysis and Text Region Extraction from Low Quality Document Images

This paper aims at detecting textual regions by separating graphical regions using Selective CRLA scheme and statistical textual properties on noise infected and low resolution newspaper images. A Bottom Up approach is adopted (i.e.) Selective Constrained Run Length algorithm (CRLA) is applied to obtain the layouts and region growing method over it, segments the homogeneous regions. Statistical...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths

In this paper, we strive towards the development of efficient techniques in order to segment document pages resulting from the digitization of historical machine-printed sources. This kind of documents often suffer from low quality and local skew, several degradations due to the old printing matrix quality or ink diffusion, and exhibit complex and dense layout. To face these problems, we introd...

متن کامل

Skew detection for complex document images using robust borderlines in both text and non-text regions

0167-8655/$ see front matter 2008 Elsevier B.V. A doi:10.1016/j.patrec.2008.06.008 * Corresponding author. Address: National Lab on University, Beijing 100871, China. Fax: +86 10 62755 E-mail address: [email protected] (H. Liu). A new skew detection method for complex document images based on robust borderlines extracted from both text and non-text regions is proposed in this paper. First, bor...

متن کامل

Newspaper Headlines Extraction from Microfilm Images

Automatic indexing is important for a digital library to provide digitized manuscripts of old document images and their electronic text. As an essential step in creating such a system, this paper discusses the issue of extracting headlines from old newspaper microfilms. Most research on document layout analysis has largely assumed relatively clean images. However microfilm images of old newspap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007